class: center, middle, inverse, title-slide .title[ # Genomic Variation ] .author[ ###
Ratnesh Bhai Mehta
• 28-Apr-2023 ] .institute[ ### Zifo RnD Solutions ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ------------ Only edit title, subtitle & author above this ------------ --> --- ## Topics covered <br> * What is genetic variation? * Types of genetic variation studies * Variant identification and analysis --- ## Genetic variation <br> ### Germline vs Somatic Variants <br> <img src="data:image/png;base64,#data/genvar/Picture1.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### What makes us unique? <br> <img src="data:image/png;base64,#data/genvar/Picture2.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### Types of variation * **RFLP: Restriction Fragment Length Polymorphism** * **VNTR: Variable Number of Tandem Repeats** * or minisatellite * ~10-100 bp core unit * **SSR : Simple Sequence Repeat** * or STR (simple tandem repeat) * or microsatellite * ~1-5 bp core unit * **SNP: Single Nucleotide Polymorphism** * Commonly used to also include rare variants (SNVs) * **Insertions or deletions** * INDEL – small (few nucleotides) insertion or deletion * **Rearrangement** (inversion, duplication, complex rearrangement) * CNV: **C**opy **N**umber **V**ariation --- ## Genetic variation <br> ### Single Nucleotide Polymorphishm and Mutation * **Genetic Polymorphism** * Common variation in the population: * Phenotype (eye color, height, etc.) * Genotype (DNA sequence polymorphism) * Frequency of minor allele(s) >= 1% * **DNA sequence variation** * Most common <= 0.99 (Polymorphism) * Minor allele >= 1% * Rare variant < 0.01% * **DNA mutation – any change in DNA sequence** * Silent vs. amino acid substitution vs. other * Neutral vs. disease-causing * 1X10-8/bp/generation (~70 new mutations/individual) * **Common but incorrect usage** * “Mutation” vs. ”Polymorphism” --- ## Genetic variation <br> ### Mutation .pull-left-60[ * A mutation is a change in the “normal” base pair sequence. * Can be: * A single base pair substitution * A deletion or insertions of 1 or more base pairs (indel) * A larger deletion/insertion or rearrangement <br> <br> <img src="data:image/png;base64,#data/genvar/Picture3.jpg" width="100%" style="display: block; margin: auto;" /> ] .pull-right-40[ <img src="data:image/png;base64,#data/genvar/Picture4.jpg" width="30%" style="display: block; margin: auto;" /> ] --- ## Genetic variaion <br> ### Silent Sequence Change (Synonymous SNP) <br> <img src="data:image/png;base64,#data/genvar/Picture5.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variaion <br> ### Missense Mutation (Non-Synonymous SNP) <br> <img src="data:image/png;base64,#data/genvar/Picture6.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variaion <br> ### Nonsense Mutation (Non-Synonymous SNP) <br> <img src="data:image/png;base64,#data/genvar/Picture7.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### Frameshift Mutation (Non-Synonymous SNP) <br> <img src="data:image/png;base64,#data/genvar/Picture8.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### Splice Site Mutation <br> <img src="data:image/png;base64,#data/genvar/Picture9.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### Other Variant Types <br> <img src="data:image/png;base64,#data/genvar/Picture10.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Genetic variation <br> ### Size Spectrum of Human Sequence Variation <br> <img src="data:image/png;base64,#data/genvar/Picture11.jpg" width="100%" style="display: block; margin: auto;" /> --- ## Genetic Variation Studies <br> ### Genome Wide Association Studies (GWAS) <br> .pull-left-50[ * Genotyping individuals at common variants across the genome using genome wide SNP arrays. * Variants associated with trait, or within the same haplotype as a variant associated with a trait, will be found at a higher frequency in cases than controls. * Statistical analysis is carried out to indicate how likely a variant is to be associated with a trait. The p-value of the association indicates how likely the variant is to be associated with the trait. ] .pull-right-50[ <img src="data:image/png;base64,#data/genvar/Picture12.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Genetic Variation Studies <br> ### Functional Genetic Variation Studies <br> * **Aim:** understand the molecular mechanisms and pathways that link genotype to phenotype. * Simple variants that alter the translated protein sequence, such as, missense, splice site variant, stop gained, stop lost variants, can cause functional consequences by: * Altering ligand and/or co-factor binding sites * Alter the natural protein structure by: * Removing or adding additional cysteine reduces that can alter disulfide bond patterns * Alter normal formation of secondary structure elements or their interaction (sickle cell anemia is an example of this) * Disrupt the normal interactions between proteins’ tertiary protein complexes or other cellular components * Remove or add post-translational modification sites. * Personalize medicine, precision medicine, ACMG guidelines --- ## Genetic Variation Studies <br> ### Population Genetics <br> * Study of variation within populations of individuals. * Data from genome-scale population genetics studies has been used to: <br> <img src="data:image/png;base64,#data/genvar/Picture13.jpg" width="100%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Technologies <br> * **SNP Array** * **Next Generation Sequencing** * Gene Panel Sequencing * Whole Exome Sequencing (WES) * Whole Genome Sequencing (WGS) --- ## Variant Identification and Analysis <br> ### Microarray <br> .pull-left-55[ * **Microscopic slide usually made of glass, silicon chip or nylon membrane.** * **Surface provided with thousands of minute pores in defined positions**. * **Able researchers analyze thousands of genes in a single reaction.** * **Various types:** * DNA microarrays, MMChips, Protein microarrays, Peptide microarrays, Tissue microarrays, Cellular microarrays, Chemical compound microarrays, Antibody microarrays, Carbohydrate microarrays, Phenotype microarrays, Reverse phase protein microarrays, Interferometric reflectance imaging sensor or IRIS. ] .pull-right-45[ <img src="data:image/png;base64,#data/genvar/Picture14.jpg" width="80%" style="display: block; margin: auto;" /> ] --- ## Variant Identification and Analysis <br> ### DNA microarray <br> * **Types:** cDNA microarrays, oligo DNA microarrays, BAC microarrays and SNP microarrays. * SNP microarray **works on the principle of DNA hybridization in which a single base change can be detected through fluorescence chemistry.** * Application: * Haplotype and gene mapping * Cancer research * Personalized genetic research * Genetic medicine research * Genome-wide association studies * SNP array completes in three common steps: * Immobilization oligonucleotides/probes (make a chip) * Fragmentation and labelling nucleic acid * Hybridization --- ## Variant Identification and Analysis <br> ### Major Techniques for Detection of SNPs Using Microarrays <br> <img src="data:image/png;base64,#data/genvar/Picture15.jpg" width="70%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### SNP Array Process Flow <br> <img src="data:image/png;base64,#data/genvar/Picture16.jpg" width="50%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Raw Intensity File to VCF (Illumina) <br> <img src="data:image/png;base64,#data/genvar/Picture17.jpg" width="90%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Raw Intensity File to VCF (Illumina & Affymetrix) <br> <img src="data:image/png;base64,#data/genvar/Picture18.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Next Generation Sequencing (NGS) <br> <img src="data:image/png;base64,#data/genvar/Picture19.jpg" width="85%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Framework for Variant Discovery (NGS) <br> <img src="data:image/png;base64,#data/genvar/Picture20.jpg" width="70%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Mapping (NGS) .pull-left-50[ * **Place reads with an initial alignment on the reference genome using mapping algorithms. * Refine initial alignments * local realignment around indels * molecular duplicates are eliminated * Generate the technology-independent SAM/BAM alignment map format. **Accurate mapping crucial for variation discovery** ] .pull-right-50[ <img src="data:image/png;base64,#data/genvar/Picture21.jpg" width="40%" style="display: block; margin: auto;" /> ] --- ## Variant Identification and Analysis <br> ### Discovery of Raw Variants <br> .pull-left-55[ * Analysis-ready SAM/BAM files are analyzed to discover all sites with statistical evidence for an alternate allele present among the samples. * SNPs, SNVs, short indels, and SVs. ] .pull-right-45[ <img src="data:image/png;base64,#data/genvar/Picture22.jpg" width="60%" style="display: block; margin: auto;" /> ] --- ## Variant Identification and Analysis <br> ### Discovery of Analysis Ready Variants <br> .pull-left-60[ * Technical covariates, known sites of variation, genotypes for individuals, linkage disequilibrium, and family and population structure are integrated with the raw variant calls from Phase 2 to separate true polymorphic sites from machine artifacts. * At these sites high-quality genotypes are determined for all samples. ] .pull-right-40[ <img src="data:image/png;base64,#data/genvar/Picture23.jpg" width="60%" style="display: block; margin: auto;" /> ] --- ## Variant Identification and Analysis <br> ### Variant Call Format (VCF) <br> <img src="data:image/png;base64,#data/genvar/Picture24.jpg" width="100%" style="display: block; margin: auto;" /> --- ## Variant Identification and Analysis <br> ### Header Line <br> * The header line names the 8 fixed, mandatory columns; <img src="data:image/png;base64,#data/genvar/Picture25.jpg" width="20%" style="display: block; margin: auto;" /> * If genotype data is present in the file, these are followed by a FORMAT column header, then an arbitrary number of sample IDs. * The header line is tab-delimited. --- ## Variant Identification and Analysis <br> ### Array vs NGS <br> <table> <thead> <tr> <th style="text-align:left;"> ARRAY </th> <th style="text-align:left;"> ARRAY.1 </th> <th style="text-align:left;"> NGS </th> <th style="text-align:left;"> NGS.1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Pros </td> <td style="text-align:left;"> Cons </td> <td style="text-align:left;"> Pros </td> <td style="text-align:left;"> Cons </td> </tr> <tr> <td style="text-align:left;"> Relatively Inexpensive </td> <td style="text-align:left;"> High background, low sensitivity </td> <td style="text-align:left;"> Low background, very sensitive </td> <td style="text-align:left;"> Expensive </td> </tr> <tr> <td style="text-align:left;"> Easy Sample Prep </td> <td style="text-align:left;"> Limited dynamic range </td> <td style="text-align:left;"> Large dynamic range </td> <td style="text-align:left;"> Complex sample preparation </td> </tr> <tr> <td style="text-align:left;"> Mature informatics & Stats. </td> <td style="text-align:left;"> Not quantitative </td> <td style="text-align:left;"> Quantitative </td> <td style="text-align:left;"> Limited bioinformatics </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Competitive hybridization </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Massive information technology infrastructure required </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Annotation probes </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- ## Variant Identification and Analysis <br> ### Tasks (NGS) <br> * Article reading and discuss * [DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43(5):491-8. PMID: 21478889 (2011).](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083463/) * [Narendra M. et al. A Bioinformatics Pipeline for Whole Exome Sequencing: Overview of the Processing and Steps from Raw Data to Downstream Analysis .BioRxiv (2017).](https://www.biorxiv.org/content/10.1101/201145v1.full) * Hands on “Disease causing mutation” (NGS) --- ### Terminologies <br> * **Variation:** any difference between individuals of a particular species. * **Mutation:** alteration in the nucleotide sequence of a gene. * **Alleles:** Different versions of the same variant. * **Reference allele:** to the base that is found in the reference genome. * **Alternative allele:** any base, other than the reference allele found at that locus (position). * **Major allele:** most common allele for a given SNP. * **Minor allele:** less common allele for a given SNP. MAF (Minor Allele Frequency) * **Genotype:** genetic make-up of an individual. * **Phenotype:** physical traits and characteristics of an individual and are influenced by their genotype and the environment <!-- --------------------- Do not edit this and below --------------------- --> --- name: end_slide class: end-slide, middle count: false # Thank you. Questions? .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 28-Apr-2023 • James Ashmore • <a href="https://www.zifornd.com/category/omics-bioinformatics">Bioinformatics</a> • <a href="https://www.zifornd.com">Zifo</a> </p> ]